Locally Non-Linear Learning for Statistical Machine Translation via Discretization and Structured Regularization
نویسندگان
چکیده
Linear models, which support efficient learning and inference, are the workhorses of statistical machine translation; however, linear decision rules are less attractive from a modeling perspective. In this work, we introduce a technique for learning arbitrary, rule-local, nonlinear feature transforms that improve model expressivity, but do not sacrifice the efficient inference and learning associated with linear models. To demonstrate the value of our technique, we discard the customary log transform of lexical probabilities and drop the phrasal translation probability in favor of raw counts. We observe that our algorithm learns a variation of a log transform that leads to better translation quality compared to the explicit log transform. We conclude that non-linear responses play an important role in SMT, an observation that we hope will inform the efforts of feature engineers.
منابع مشابه
Locally Non-Linear Learning via Feature Induction and Structured Regularization in Statistical Machine Translation
Linear models, which support efficient learning and inference, are the workhorses of statistical machine translation; however, linear decision rules are less attractive from a modeling perspective. The combination of a simple learning technique and such a simple model means the overall learning process may ignore useful non-linear information present in the feature set. This places significant ...
متن کاملAlternating linearization for structured regularization problems
OF THE DISSERTATION Alternating linearization for structured regularization problems by Minh Pham Dissertation Director: Andrzej Ruszczyński Xiaodong Lin We adapt the alternating linearization method for proximal decomposition to structured regularization problems. The method is related to two well-known operator splitting methods, the Douglas-Rachford and the Peaceman-Rachford method, but it h...
متن کاملLarge-Margin Structured Prediction via Linear Programming
This paper presents a novel learning algorithm for structured classification, where the task is to predict multiple and interacting labels (multilabel) for an input object. The problem of finding a large-margin separation between correct multilabels and incorrect ones is formulated as a linear program. Instead of explicitly writing out the entire problem with an exponentially large constraint s...
متن کاملA Unifying Framework in Vector-valued Reproducing Kernel Hilbert Spaces for Manifold Regularization and Co-Regularized Multi-view Learning
This paper presents a general vector-valued reproducing kernel Hilbert spaces (RKHS) framework for the problem of learning an unknown functional dependency between a structured input space and a structured output space. Our formulation encompasses both Vector-valued Manifold Regularization and Co-regularized Multi-view Learning, providing in particular a unifying framework linking these two imp...
متن کاملLog-linear weight optimisation via Bayesian Adaptation in Statistical Machine Translation
We present an adaptation technique for statistical machine translation, which applies the well-known Bayesian learning paradigm for adapting the model parameters. Since state-of-the-art statistical machine translation systems model the translation process as a log-linear combination of simpler models, we present the formal derivation of how to apply such paradigm to the weights of the log-linea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- TACL
دوره 2 شماره
صفحات -
تاریخ انتشار 2014